AITopics | generalized lower bound q-learning

Collaborating Authors

generalized lower bound q-learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Self-Imitation Learning via Generalized Lower Bound Q-learning

Neural Information Processing SystemsDec-24-2025, 09:31:53 GMT

Self-imitation learning motivated by lower-bound Q-learning is a novel and effective approach for off-policy learning. In this work, we propose a n-step lower bound which generalizes the original return-based lower-bound Q-learning, and introduce a new family of self-imitation learning algorithms. To provide a formal motivation for the potential performance gains provided by self-imitation learning, we show that n-step lower bound Q-learning achieves a trade-off between fixed point bias and contraction rate, drawing close connections to the popular uncorrected n-step Q-learning. We finally show that n-step lower bound Q-learning is a more robust alternative to return-based self-imitation learning and uncorrected n-step, over a wide range of benchmark tasks.

generalized lower bound q-learning, q-learning, self-imitation learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Review for NeurIPS paper: Self-Imitation Learning via Generalized Lower Bound Q-learning

Neural Information Processing SystemsJan-27-2025, 03:52:26 GMT

Weaknesses: The performance improvement is incremental and needs to be further evaluated. For example, each experiment should be conducted over 5 random seeds, instead of 3 seeds, for a more accurate comparison. Besides, in only 3 out of 8 environments, shown in Figure 2, the proposed method shows clear improvement. And more baseline methods should be considered, such as SAC. So, how does the generalise SIL compare to SIL in the Montezuma's Revenge task?

generalized lower bound q-learning, neurips paper, self-imitation learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.76)

Add feedback

Review for NeurIPS paper: Self-Imitation Learning via Generalized Lower Bound Q-learning

Neural Information Processing SystemsJan-27-2025, 03:52:18 GMT

The author response provided satisfactory answers to the concerns of the reviewers with respect to contraction/bias tradeoff, disconnect between the experimental results and theory, and variance of the estimator. This lead one reviewer to increase their score for this paper, which already had reasonably solid scores.

generalized lower bound q-learning, neurips paper, self-imitation learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.76)

Add feedback

Self-Imitation Learning via Generalized Lower Bound Q-learning

Neural Information Processing SystemsOct-10-2024, 23:14:51 GMT

generalized lower bound q-learning, q-learning, self-imitation learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback